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A systematic and meta-analytic review was conducted 
of the effectiveness of school-based mental health and 
behavioral programs for low-income, urban youth. 
Applying criteria from an earlier systematic review 
(Rones & Hoagwood, 2000) of such programs for all 
populations indicated substantially fewer effective pro- 
grams for low-income, urban youth. The meta-analysis 
similarly failed to indicate effects of the typical program 
on primary outcomes. Effectiveness was evident, how- 
ever, for programs that targeted internalizing problems 
or had a broader socio-emotional focus and those deliv- 
ered to all youth (i.e., universal). In contrast, negative 
effects were apparent for programs that targeted exter- 
nalizing problems and were delivered selectively to 
youth with existing problems. Distinctive characteristics 
of low-income, urban schools and nonschool environ- 
ments are emphasized as potential explanations for the 
findings. 
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More* than 39.8 million people* live in poverty in the 
United States. Of these, more than 14 million are 
youth under the age of 18 (U.S. Bureau of the Census, 
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2011). Youth of color are especially likely to he poor. 
Poverty rates for African American youth are almost 
two and a half times those for European American 
youth and poverty rates for Latino youth are nearly 
two times those for European American youth, with 
approximately 34.4% of African American and 30.3% 
of Latino youth living in poverty (U.S. Bureau of the 
Census. 2011). Youth of color are also more likely to 
live in segregated urban communities where there are 
few resources and high rates of unemployment, home- 
lessness. and crime (U.S. Department of Health and 
Human Services. 2001). 

Poverty is associated with stressors that range from 
major life events (e.g., child abuse, divorce) to chronic 
interpersonal stressors (e.g.. family conflict) to daily 
hassles (e.g.. lack of money for transportation; Conger, 
C.e, Elder, Lorenz, & Simons, 1994). Additionally, liv- 
ing in urban poverty brings increased exposure to crime 
and violence, particularly for adolescents (e.g.. Bell Si 
Jenkins. 1994). For example, more than 40% of inner- 
city youth have seen someone shot or stabbed (U.S. 
Department of Health and Human Services, 2001). 
Traumatic and stressful experiences, in turn, have been 
established as risk factors tor a range of psychological 
problems (e.g.. Grant et al., 2003). 

It is not surprising, therefore, that low-income, urban 
youth are at heightened risk for a number of psychologi- 
cal problems (Grant et al., 2004). For example, based on 
the normative data for the Youth Self-Report (YSR: 
Achenbach, 1991). approximately 5% of Grant and 
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colleagues’ (2004) predominantly minority sample of 
low-income, urban youth were expected to score in the 
clinical range. However, significantly more adolescents 
scored in the clinical range on each of the YSR syn- 
drome subscales, including Internalizing (24% of girls; 
28% of boys), Externalizing (35% of girls; 26% of boys), 
Anxious-Depressed (7% of girls; 9% of boys), and Delin- 
quent Behavior (17% of girls; 16% of boys). 

Although low-income, urban youth are at higher 
risk for the development of psychological problems, 
they are less likely to receive help (Farmer, Stangl, 
Burns, Costello, & Angold, 1999; Garland et al., 
2005). A recent study found that nearly 80% of low- 
income youth in need of mental health services had 
not received services within the preceding 12 months 
(Kataoka, Zhang, & Wells, 2002). Additionally, those 
who do receive mental health services experience attri- 
tion rates >50% because of a number of practical and 
structural barriers such as stigma, lack of information, 
inaccessible location of services, or difficulty with trans- 
portation (Kazdin, Holland, &' Crowley, 1997). Given 
these barriers and the lack of services provided to 
youth most at need, the education sector has played a 
central role as an entry point into the mental health 
system for low-income, urban youth (Farmer, Burns, 
Phillips, Angold, &' Costello, 2003; U.S. Department 
of Health and Human Services, 1999). 

Comprehensive health services, including services 
for mental health, were first offered in schools in the 
mid-1980s (Dolan, 1992). Since then, the Surgeon 
General’s Report (U.S. Department of Health and 
Human Services, 1999) has described schools as a key 
setting for the identification and treatment of mental 
disorders in children and youth. Data from the 2004- 
2005 national survey of 1,235 school-based health cen- 
ters (SBHCs) revealed that 65% of SBHCs have services 
provided by mental health staff and 59% of the centers 
are located within urban schools (National Assembly on 
School-Based Health Care, 2007). Nonetheless, the 
type or quality of services being offered in school set- 
tings is poorly understood (Rones & Hoagwood, 2000). 
Extant research suggests school-based mental health 
services are being delivered in a variety of ways without 
an explicit “best practice” model (Paternite, 2005). 

A systematic review of published reports of school-based 
mental health services spanning 1985-1999 was com- 


pleted by Rones and Hoagwood (2000). These authors 
defined school-based mental health services as “any 
program, intervention, or strategy applied in a school 
setting that was specifically designed to influence stu- 
dents’ emotional, behavioral, or social functioning.” 
The results of that review were organized around the 
target of the intervention (e.g., depression, conduct 
problems) and within each of these categories, the level 
at which the program/service intervened. Both univer- 
sal (administered to all youth) and selective/indicated 
programs (administered to youth with identified 
problems) were included; thus, both treatment and 
prevention studies were examined. 

Forty-seven studies were included in the Rones and 
Hoagwood (2000) review. Overall. 17 (36%) were clas- 
sified as effective, 17 (36%) as mixed, and nine as inef- 
fective (28%). More specifically, five studies focused on 
emotional and behavioral problems more generally, 
with three identified as being effective and two with 
mixed results; six studies focused on depression, with 
three effective, one mixed, and two listed as not effec- 
tive; 22 studies focused on conduct problems, with 
eight identified as effective, 10 with mixed results, and 
four not effective; two studies, both effective, focused 
on stress management; finally. 12 studies focused on 
substance use, with three effective, six mixed, and 
three not effective. These results suggest that there are 
a number of school-based mental health programs that 
have evidence of impact across a range of emotional 
and behavioral problems. Rones and Hoagwood (2000) 
identified five key characteristics of school-based 
programs that positively affected outcomes, service 
sustainability, and maintenance, including (a) consistent 
program implementation; (b) inclusion of parents, 
teachers, or peers; (c) use of multiple modalities; 
(d) integration of program content into general class- 
room curriculum; and (e) developmental^ appropriate 
program components. 

The purpose of the current review is to build upon 
the Rones and Hoagwood (2000) review by systemati- 
cally examining school-based mental health services 
and programs implemented with low-income, urban 
youth, given (a) this population is at particular risk for 
developing psychological problems as a result of the 
chronic and severe stressors they face, (b) they have 
limited access to services outside a school setting, and 
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(c) published reviews in this area have nut focused on 
the socioeconomic characteristics of study samples and 
no review to date has specifically focused on low- 
income youth living in urban settings. 

We chose to replicate the Rones and Hoagwood 
(2000) approach and reviewed interventions that 
targeted a spectrum of disorders for several reasons: 

(a) this allowed us to compare our findings with theirs: 

(b) low-income, urban youth have been shown to be at 
heightened risk for a range of psychological problems 
(Grant et al., 2004); (c) a number of school-based 
interventions target multiple outcomes (see Rones & 
Hoagwood, 2000, and this review); and (d) there is a 
growing trend in intervention research to develop inter- 
ventions that simultaneously target a range of psycho- 
logical problems (McHugh, Murray, & Barlow, 2009). 
The current review included all studies previously iden- 
tified in the Rones and Hoagwood (2000) review that 
were conducted with low-income, urban youth, and 
updated the literature with studies published from 2000 
through 2009 that met their specific inclusion criteria. 

The current review addresses two primary questions 
in an effort to inform best practices for mental health 
services and program implementation for low-income, 
urban youth within school settings: (a) How effective 
have school-based mental health and behavioral pro- 
grams been in promoting positive outcomes for low- 
income, urban youth? More specifically, to what extent 
have these programs had a favorable impact on out- 
comes for low-income, urban youth and to what 
extent are impacts sustained over time? (b) What fac- 
tors influence the effectiveness of school-based mental 
health and behavioral programs for low-income, urban 
youth? More specifically, do program effects vary based 
on the type of problem that is targeted (e.g., conduct 
disorder vs. depression), the characteristics of the inter- 
vention (e.g., universal vs. selected), the characteristics 
of the sample (e.g., ethnic composition), or the factors 
that Rones and Hoagwood (2000) concluded were 
important based on their narrative review of the litera- 
ture on school-based mental health and behavioral pro- 
grams for general populations of youth? 

To apply the most rigorous approach to answering 
these questions, we conducted a meta-analysis of all 
studies that met our inclusion criteria and provided 
data necessary to calculate effect sizes (see Table 1). To 


apply the most comprehensive approach to answering 
these questions and to allow for a more direct compari- 
son with results of the Runes and Hoagwood (2000) 
review, we also qualitatively categorized the effective- 
ness of studies that met our inclusion criteria but did 
not provide data necessary to calculate effect sizes. 

METHOD 
Inclusion Criteria 

To be included in our review, the program evaluated 
had to meet Rones and Hoagwood’s (2000) definition 
of “school-based mental health services” provided ear- 
lier in this article. Consistent with this definition, pro- 
grams designed to influence only academic 
achievement were excluded; however, academic- 
related outcomes when assessed by eligible studies were 
included (e.g.. Walker, Kerns, Lyon, Bruns, & 
Cosgrove, 2010). We excluded mentoring programs 
based in schools, as no interventions of this type were 
included in the Rones and Hoagwood review and 
because the effectiveness of such programs has been 
reviewed elsewhere (e.g., Wheeler, Keller, ik DuBois, 
2010). Consistent with Rones and Hoagwood, we 
included school-based studies with additional 
nonschool components (e.g., family intervention) and 
specifically tested whether these added components 
were associated with stronger effects. 

We also used the same methodological inclusion cri- 
teria used by Rones and Hoagwood (2000). Specifi- 
cally, only those studies that included a control group, 
standardized outcome measures, and outcomes assessed 
at baseline and postintervention were eligible. Three 
designs were considered acceptable: (a) randomized 
designs: (b) quasi-experimental designs that used multi- 
ple sites and demographically matched samples to mini- 
mize selection biases; and (c) multiple baseline designs 
using sample cohorts as their own controls (Rones & 
Hoagwood, 2000). Although the inclusion criteria for 
control groups were not specified by Rones and 
Hoagwood, all studies with groups intended to serve as 
controls were included in this review. 

Additional inclusion criteria were as follows. First, 
studies had to be conducted with samples in the 
United States to ensure uniformity in the definition 
of low income and to avoid confounding of findings 
with cross-cultural differences in school contexts. 
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Table 1 . School-based mental health programs for low-income students with effect size calculations 

Mean Effect 

Total N (Treatment Sex (% Size (g)— Primary 


Study 

Program Title 

Croup n) 

Grade 

Female) 

Ethnicity 

Target Problem 

Delivery Agent 

Outcomes* 

Aber et al. (1998) 

Resolving Conflict 
Creatively Program 

5,053 (not specified) 

2nd-6th 

48% 

43% Latino; 36% African American; 
16% European American; 
and 5% Not specified 

Conduct Problems (U) 

UC — teachers 

Post: 0.05 

Abt Associates (2001 ) 

Families and 

Schools Together 

382 (193) 

2nd-4th 

38% 

90% African American and 10% 

Not specified 

Conduct Problems (S) 

UC — teachers 

Post: 0.05 

Botvin et al. (2001) 

Life Skills Training 

3,621 (2.144) 

7th— 8th 

53% 

61% African American; 22% Latino; 
6% Asian American; 6% 

European American; and 5% 
Mixed/Not specified 

Substance Use (U) 

UC— teachers 

Post: 0.07 

F/U: 0.10 

Cardemil et al. 

(2002, 2007) 

Penn Resiliency 
Program 

1: 49 (23); 

II: 103 (47) 

5th-6th 

1: 45% 

II: 56% 

1: 100% Latino 

II. 100% African American 

Depression (U) 

R 

Latino 

Post: 0.93 

F/U1: 0.63 

F/U2: 0.97 
African American 
Post: 0.16 

Cho et al. (2005) 

Re-connecting Youth 

1,218 (617) 

9th— 1 1 th 

50% 

47% Latino; 24% 

Asian American; 

13% African American; 9% 
European American; and 7% 
American Indian/ Not specified 

Conduct Problems (S) 

UC — teachers 

Post: ! 0.02 

F/U: iO.IO 

Henderson et al. 

(1992) 

Stress-Control 

Program 

65 (33) 

3rd 

57% 

60% African American and 40% 
European American 

Broad Mental Health 
and/or Behavioral (U) 

R 

Post: 0.76 

Hostetler and Fisher 
(1997) 

Project C.A.R.E. 

317 (163); 

F/U1: 187 (97) 
F/U2: 78 (39) 

4th 

43% 

39% Latino; 34% 

European American; and 

27% African American 

Substance Use (S) 

R 

Post: ) 0.16 

F/U1: 0.16 

F/U2: 0.03 

Jagers et al. (2007) 

Aban Aya Youth 
Project 

789 (417) 

5th-7th 

51 % 

Predominately African American 
(% unknown) 

Broad Mental Health 
and/or Behavioral (U) 

R 

Group 1 

Post: 0.19 

Group 2 

Post: 0.18 

McClowry et al. (2005) 

INSIGHTS into 
Children’s 
Temperament 

148 (91) 

1st-2nd 

34% 

89% African American; 9% Latino; 
and 2% Mixed/ Not specified 

Broad Mental Health 
and/or Behavioral (U) 

R 

Post: 0.61 

McDonald et al. (2006) 

Families and Schools 
Together 

130 (80) 

1 st— 4th 

73% 

100% Latino 

Broad Mental Health 
and/or Behavioral (U) 

UC — teachers 

Post: 0.63 

Metropolitan Area 

Child Study Research 
Croup (MACS. 2002) 

MACS intervention 

Early Levels: 

A: 152 (120) 

B: 171 (82) 

C: 238 (120) 

Late Levels: 

A: 103 (52) 

B: 67 (37) 

C: 100 (49) 

Early: 

2nd-3rd; 

Late: 

5th-6th 

39% 

48% African American; 37% Latino; 
and15% European American 

Conduct Problems (S) 

UC — teachers 

Early 

Post: A: 0.11 

Post: B: i 0.17 
Post: C: } 0.41 
Late 

Post: A: ) 0.36 
Post: B: j 0.50 
Post: C: ) 0.27 


(Continued) 
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Table 1. (Continued) 

Study 

Program Title 

Total N (Treatment 
Group n) 

Grade 

Sex (% 
Female) 

Ethnicity 

Target Problem 

Delivery Agent 

Mean Effect 

Size (g) — Primary 
Outcomes* 

Mufson et al. (2004) 

Interpersonal 
Psychotherapy for 
Depressed 
Adolescents 

63 (34) 

Middle and 
High 

School 

84% 

71% Latino and 29% 

Not specified 

Internalizing 

Symptoms (S) 

UC — school-based 
health clinic 
therapists 

Post; 0.42 

Murray and Malmgren 
(2005) 

N/A 

48 (24) 

9th— 1 2th 

23% 

100% African American 

Broad Mental Health 
and/or Behavioral (S) 

UC — teachers 

Post; 0.25 

Sinclair et al. (2005) 

Check and Connect 

144 (71) 

High School 

18% 

67% African American; 22% 
European American; and 

1 1 % Not specified 

Broad Mental Health 
and/or Behavioral (S) 

R 

Post: 0.38 

Tolan et al. (2004) 

SAFEChildren 

442 (243) 

1st 

49% 

58% Latino and 42% 

African American 

Conduct Problems (U) 

R 

Post: ) 0.05 

F/U: ! 0.06 

Weiss et al. (2003) 

Reaching Educators, 
Children & Parents 

93 (62) 

4th 

37% 

56% African American; 38% 
European American; and 

6% Not specified 

Broad Mental Health 
and/or Behavioral (S) 

R; outside 
master's- 
level social 
workers 
and psychiatric 
nurses 

Post: 0.11 

F/U: 0.28 


Note. Post = Post- treatment; F/U1 = First follow-up, F/U2 = Second follow-up; U = Universal; S = Selected; UC = Usual care provider; R = Research staff. 

* Effect sizes were computed so that positive values indicated differences in directions consistent with a favorable effect of the intervention group on youth outcomes (e.g., higher self-esteem, 
fewer symptoms of depression). 


Second, the study sample had to be predominately or 
exclusively low income and urban. A study was 
judged to meet the urban portion of this criterion if 
it stated its sample was drawn from schools in an 
urban area or from a city deemed to be an urban area 
(U.S. Bureau of the Census, 2000). It a sample 
included a mixed urban and suburban/rural popula- 
tion, it was excluded unless results were clearly parsed 
out for the urban youth and these youth met the 
poverty inclusion criteria. The poverty criterion was 
met if more than 60% of the sample was described as 
low income based on measures such as eligibility for 
free school lunch programs or economic indexes. 
Additionally, when this information was not reported, 
studies were included if terms such as “all,” “major- 
ity,” or “most” were used to describe the sample’s 
level ot poverty or if urban poverty was discussed in 
the context of an “inner-city” sample, which, as 
defined, includes both urban and impoverished ele- 
ments. However, it the city was described as impover- 
ished but the study sample was described as <60% 
impoverished, the study was excluded. Third, as was 
the case with the Rones and Hoagwood (2000) 
review, studies had to be published in a peer- 
reviewed journal; accordingly, theses and dissertations 
were not included. Fourth, studies needed to either 
be included in the Rones and Hoagwood review or 
published in the period of 2000-2009. 

Literature Search 

Although only one study was explicitly described as 
implemented with a low-income, urban sample in text, 
our examination of the 47 articles reviewed by Rones 
and Hoagwood (2000) revealed that eight of the evalu- 
ations were conducted with a sample of predominately 
low-income, urban youth. These studies were included 
in the current review. In addition, a computerized 
search of references published between 2000 and 2009 
identified from PsycINFO and a manual search (track- 
ing citations within identified articles) were used to 
update the literature search, resulting in a review that 
spans 1985-2009. Search terms used by Rones and 
Hoagwood included “schools, children, mental health, 
services, prevention, outcomes, effectiveness, and 
specific syndromes (e.g., ADHD, depression), among 
others.” Key words for identifying studies between 


2000 and 2009 included those terms in addition to the 
following: psychopathology (or outcomes), youth (or 
adolescents), urban (or inner-city), low-income (or 
impoverished or poverty), and services (or intervention 
or risk reduction). Various combinations of these terms 
were used to cast the broadest net possible. 

These search procedures resulted in a final sample of 
23 articles evaluating 19 programs across 21 studies and 
29 independent samples. Most studies included both men 
and women, with two studies (Murray & Malmgren, 
2005; Sinclair, Christenson, & Thurlovv, 2005) including 
a predominately male (over 75%) sample. A total of 20 
studies (95% Cl) used random assignment in their design 
(12 individual level; one classroom level; seven school 
level). The remaining study used a quasi-experimental 
design. Study designs included intervention conditions 
compared with no treatment controls (86%), attention/ 
placebo controls (10%), and waitlist controls (4%). 

Study Coding 

Two clinical psychology doctoral students coded the 
studies. Both were trained and supervised by doctoral- 
level psychologists, one of whom completed indepen- 
dent coding ot selected variables (e.g., target focus of 
the program), with discrepancies from original coding 
resolved by consensus. 

Target Focus. Programs were coded as primarily 
focused on conduct, depression, substance use, or 
broad mental health and/or behavioral targets. Studies 
pulled from Rones and Hoagwood’s (2000) review 
maintained the same target focus code as applied in that 
review with the exception ot one study, which was 
categorized as “stress management” in the Rones and 
Hoagwood review but categorized as broad mental 
health and/or behavioral in this review (Henderson, 
Kelbey, & Engerbreston. 1992). 

Level of InUn'vntion. Studies were categorized as 
universal it delivered to all youth; selected it delivered to 
youth selected on the basis of symptoms, behaviors, or 
other criteria short of diagnosis; and indicated if provided 
to youth who met diagnostic criteria for a mental health 
disorder (Institute of Medicine, 1994). As there were 
fewer selected and indicated interventions, these catego- 
ries were combined in a selected/indicated category. 
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This categorization is consistent with the one used in the 
Rones and Hoagwood (2000) review. 

Change Agent. The change agent refers to the 
person who delivered the program. Change agent 
categories were researchers/research-hired or usual care 
providers (e.g., teachers). 

Ethnicity. Studies were categorized based on the 
predominant (>75%) ethnic composition of their sam- 
ple. Codes included predominantly African American 
and predominantly Latino. Although Asian American, 
American Indian, and European American students 
were enrolled in a number of eligible studies, no study 
had 75% or greater enrollment of any of these groups. 
Multiple minority was coded if at least 75% of the sam- 
ple was non-European American but no single group 
made up 75% of the sample. Multiple ethnicity was 
coded if European American youth were included in 
the sample and the combined ethnic minority enroll- 
ment was below 75% of the sample. (Note: There 
were two exceptions to this categorization. One study 
comprised 71% Latino youth, with no other informa- 
tion on ethnicity included in the predominantly Latino 
group, and one study with a sample of 72% minority 
youth without specification of ethnic groups was 
included in the multiple minority' category.) 

Consistent Program Implementation. Implementation 
can refer to (a) fidelity, or the extent to which a pro- 
gram is delivered as it was intended; (b) dosage, or the 
number of sessions delivered or completed; or (c) qual- 
ity, or how well different program components are 
delivered (Durlak & DuPre. 2008). Studies were coded 
on whether or not implementation was assessed, and, if 
so, the reported quality of the study’s implementation 
(low, adequate, high). 

Inclusion of Parents, Teachers, ami Peers. Programs 
were coded to indicate whether or not parents, teachers/ 
other school staff, and/or peers were incorporated. 

Multiple Modalities/ Components. Programs were 
further coded as to whether they included components 
beyond a school-based curriculum (e.g., also a family 
intervention). 


Integration Into Classroom. Integration into the 
classroom was coded based on the definition provided 
by Rones and Hoagwood (2000): “delivered as an 
integral part of the classroom curricula rather than as a 
separate and specialized lesson.’’ 

Developmental Timing. Rones and Hoagwood 
(2000) coded the concepts and curricula based on 
whether they were developmentally appropriate. In the 
present review, an analysis of available materials sug- 
gested this task would be too speculative, so it was 
dropped. Instead, the current review examined devel- 
opmental timing of programs by using age as a moder- 
ator. Average age of participants was examined both 
continuously and as a categorized grade variable: ele- 
mentary (ages 5-10), middle school (ages 11-13), and 
high school (ages 14-18). 

Outcomes. Primary outcomes were those related 
to the target of the program. Internalizing outcomes 
were deemed primary' for depression-focused programs; 
externalizing outcomes were deemed primary for con- 
duct-focused programs; substance use outcomes were 
deemed primary' for substance use programs; and exter- 
nalizing, internalizing, competence/social skills, and 
academic outcomes were deemed primary for broad 
mental health and/or behavioral-focused programs (see 
Table 1). All other outcomes were coded as secondary. 

Data Analytic Procedures 

Meta-Analytic Procedures. Effect sizes were com- 
puted as standardized mean differences. This involves 
taking the raw difference between treatment and control 
group means on the outcome measure at post-treatment 
and then dividing this difference by the pooled 
(weighted average) standard deviation of the measure for 
the two groups (see Cooper, 2010, formula 5.11). An 
adjustment developed by Hedges (1981) was also incor- 
porated to address bias that can arise in calculation of this 
type of effect size with small samples (Hedges’s £). Pretest 
means were subtracted from post-treatment means to 
adjust for potential differences between program and 
comparison groups at baseline and thus enhance preci- 
sion in effect size estimation (Lipsey & Wilson, 2001). In 
most instances, effect sizes were computed from means 
and standard deviations on outcome measures that were 
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included in the study report or made available to us by 
study authors. When these were not available, effect sizes 
were estimated from relevant test statistics or their 
reported significance levels (see Rosenthal. 1994). Effect 
sizes were computed so that positive values indicated dif- 
ferences in directions consistent with a favorable effect of 
the intervention group on youth outcomes (e.g., higher 
self-esteem, fewer symptoms of depression). 

The independent sample was the primary unit of 
analysis. Because effect size information was reported 
for the overall sample in most reports, each report or 
study generally contributed one independent sample to 
the analysis. In some instances, however, studies only 
reported findings separately for nonoverlapping sub- 
groups (e.g., Latino vs. African American youth; class- 
room only vs. family also conditions). These studies 
contributed more than one sample to the analysis (i.e., 
one sample for each distinct subgroup; Cooper, 2010). 
Furthermore, some studies included two or more dis- 
tinct interventions compared against a control. In these 
instances, we computed effect sizes for each interven- 
tion-control group comparison and treated the corre- 
sponding samples as independent in study analyses. 
Finally, we also allowed studies to contribute informa- 
tion to each outcome category for which effect size 
information was available. 

Effect sizes that were more than three interquartile 
ranges above the 75th percentile or below the 25th 
percentile, and thus qualified as statistical outliers 
according to Tukey’s definition (Tukey, 1977), were 
Winsorized by setting their values to the highest or 
lowest effect size, respectively, that did not qualify as 
an outlier. Doing so provided a safeguard against 
extreme effect sizes having undue influence on our 
findings. In addition, each effect size was weighted by 
the inverse of its variance to provide more efficient 
estimation of tme population effects (Hedges & Olkin, 
1985). This procedure gives greater weight to larger 
samples and is the generally preferred approach 
(Cooper, 2010). When evaluations used data that were 
clustered across multiple units (e.g., schools), effect size 
weights were adjusted to account for such clustering 
using procedures recommended by Higgins and Green 
(2008). This adjustment makes use of the intra-class 
correlation (ICC) coefficient for the outcome measure. 
Because ICC values were generally not reported in 


studies, we substituted our own estimate of 0.10 (What 
Works Clearinghouse, 2008). 

Analyses were conducted with a random-effects 
model (Hedges & Vevea. 1998). When a random- 
effects analysis is carried out, a study-level variance 
component is assumed to be present as an additional 
source of random influence on effect sizes. In general, a 
random-effects analysis is more conservative because of 
the consideration of study-level variance as an addi- 
tional component of error (Lipsey & Wilson, 2001). 
The appropriateness of a random-effects model for the 
present analysis was indicated by (a) substantial variabil- 
ity in the characteristics and participants of youth in the 
included programs and the potential for such differences 
to constitute significant sources of random error even 
after taking into account variance associated with speci- 
fied moderating variables and (b) interest in drawing 
inferences about all programs, not just those included in 
the present review (Hedges & Vevea, 1998). 

To test whether there was variability in sample-level 
effect sizes greater than that which would be expected by 
sampling error around a single population value, we con- 
ducted a homogeneity analysis using procedures 
described by Cooper (2010). Results of this analysis were 
used as well to calculate I 2 , a descriptive measure of the 
amount of the observed variability in effect sizes across 
studies that is attributable to study differences rather than 
sampling error (Higgins & Thompson, 2002). The 
homogeneity analysis and all moderator analyses were 
completed with post-treatment effect sizes only, given 
the limited number of studies that were assessed at fol- 
low-up time points. Because of the potential for con- 
founding among study characteristics tested as 
moderators in a meta-analysis (Cooper. 2010), for any 
moderator that demonstrated a significant association 
with effect size we tested whether the association 
remained evident when controlling, in turn, for each of 
the other study characteristics tested as moderators. 

Qualitative Procedures. Criteria used to label studies 
as effective, mixed, and ineffective were not clearly 
defined by Rones and Hoagwood (2000). But, it appears 
effective studies reported only significant positive out- 
comes and studies listed as ineffective reported only non- 
significant and/or negative treatment effects. Detailed 
results were not provided for effective and ineffective 
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studies, but they were provided for studies with mixed 
results. In this mixed category, results were significant 
and positive for some but not other outcomes. 

In the present review, significant positive, nonsignif- 
icant, and significant negative treatment etfects were 
used to classify ail studies (including those that did not 
provide data necessary for calculating effect sizes) as 
effective, ineffective, or mixed for a more direct com- 
parison with Rones and Hoagwood’s findings. Studies 
were categorized as effective if they reported >75% 
positive post-treatment outcomes related to the 
program’s focus, mixed if between 50% and 74% of 
program-focused post-treatment outcomes were 
positive, and ineffective it <50% of program-focused 
post-treatment outcomes were positive. 

RESULTS 

Results are presented by research question. The first 
question pertaining to overall program effectiveness 
was addressed using both qualitative and meta-analytic 
procedures, whereas the second question was addressed 
using meta-analysis only. Table 1 lists the interventions 
reviewed meta-analytically and provides study informa- 
tion, including sample characteristics and results. Effect 
sizes are listed for the overall sample. 1 

How Effective Have School-Based Mental Health and Behav- 
ioral Programs Been in Promoting Positive Outcomes for Low- 
Income, Urban Youth? More Specifically, to What Extent Have 
These Programs Had a Favorable Impact on Outcomes for Low- 
Income, Urban Youth and to What Extent Are Impacts Sus- 
tained Over Time? 

Qualitative Analyses. Qualitative analyses of the 29 
samples included in this review 2 resulted in five pro- 
grams classified as effective (17%), eight as mixed 
(28%), and 16 as ineffective (55%). Of the conduct- 
focused programs, no programs were deemed effective, 
three were deemed mixed, and nine were deemed 
ineffective. Of the depression-focused programs, one 
was deemed effective, one mixed, and one ineffective. 
Of the substance use— focused programs, one was 
deemed effective and three ineffective with no mixed 
programs. Finally, of the general mental health and 
behavioral-focused programs, three were deemed effec- 
tive, four mixed, and three ineffective. For the univer- 
sal programs, four were effective, four mixed, and six 


ineffective. For the selected programs, one was effec- 
tive, five mixed, and 10 ineffective. 

Meta-Analysis. Of the 29 independent samples, 23 
had at least one calculated effect size at post-treatment, 
six at a first follow-up time point (average length of 
time 10 months), and two at a second follow-up time 
point (average length of time 12 months). No effect 
sizes were able to be calculated beyond the second fol- 
low-up time point. The overall aggregated effect sizes 
for the primary outcomes were 0.08 (95% Cl ) 0.0 1 to 
0.17) at post-treatment, 0.06 (95% Cl ) 0.07 to 0.20) 
at first follow-up, and 0.48 (95% Cl ) 0.44 to 1.41) at 
second follow-up. The mean difference between effect 
sizes at post-treatment and first follow-up for the same 
outcome measures was 0.03 (95% Cl ) 0.08 to 0.14), 
indicating maintenance of gains over time for the inter- 
vention group. The mean difference between post- 
treatment and second follow-up for the same outcome 
measures was 0.17 (95% Cl ) 0.04 to 0.37), indicating 
additional gains over time for the intervention group; 
however, because of small sample size at this time 
point, these gains should be interpreted with caution. 

The difference at post-treatment between the aggre- 
gated effect sizes across the primary outcome categories 
was significant (Qb = 10.79, df= 3, p= .01). Given 
the low number of studies with substance use outcomes, 
these outcomes were combined with the externalizing 
outcomes for all subsequent analyses. At post-treatment, 
outcomes assessing competence and social skills had the 
highest mean effect size (Mean ES = 0.31, 95% Cl 
0.13-0.50, k Inumber of independent samples] = 5), fol- 
lowed by internalizing outcomes (Mean ES = 0.28, 95% 
Cl 0.05-0.51, k — 6), academic outcomes (Mean ES = 
0.24, 95% Cl ) 0.02 to 0.50, k = 4), and finally external- 
izing/substance use outcomes (Mean ES = 0.02, 95% 
Cl ) 0.08 to 0.12, k = 19). Differences at follow-up time 
points could not be calculated as there were too few 
studies for this analysis. 

Analyses with secondary outcomes were also con- 
ducted. The overall aggregated effect sizes for the sec- 
ondary outcomes were 0.17 (95% Cl 0.02-0.32, 
k = 14) at post-treatment, 0.02 (95% Cl ) 0.07 to 
0.11, k = 6) at first follow-up, and 0.02 (95% Cl 
) 0.25 to 0.30, k — 3) at second follow-up. The 
mean difference between effect sizes at post and first 
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follow-up for the same secondary outcome measures 
was 0.00 (95% Cl ) 0.06 to 0.06) and the mean differ- 
ence between post-treatment and second follow-up for 
the same secondary outcome measures was 0.02 (95% 
Cl ) 0.17 to 0.21). indicating neither deterioration nor 
gain in effects over time for the intervention group rel- 
ative to the control group. 

Analyses comparing mean effect sizes across secondary 
outcome categories were also conducted. Studies with 
secondary externalizing outcomes (n = 2) and studies 
with secondary substance use outcomes ( k = 2) were 
combined to increase sample size, although still resulting 
in a lower than needed sample. Studies with internalizing 
outcomes ( k = 3) and studies with competence/social 
skills outcomes (k = 6) were also combined to increase 
sample size. Academic studies had a sufficient number ot 
samples to examine on their own as secondary outcomes 
(JL- = 9). A marginally significant difference was found 
for these outcome groups (Qb = 4.96, dt = 2, p - .08), 
with secondary externalizing/ substance outcomes having 
the lowest and negative effect size (Mean ES = ) 0.05, 
95% Cl ) 0.25 to 0.16), secondary internalizing/compe- 
tence outcomes having the next highest yet still small 
effect size (Mean ES = 0.08; 95% Cl ) 0.09 to 0.24), 
and academic outcomes having the highest, yet still small 
effect size (Mean ES = 0.24, 95% Cl 0.08—0.40). 

What Factors Influence the Effectiveness of School-Based Men- 
tal Health and Behavioral Programs for Low-Income, Urban 
Youth? 

The homogeneity analysis revealed significant variation 
in effect sizes across samples, Q = 74.63, k = 23, 
p < .001. The corresponding I 2 value ot 70.5% reflects 
a medium to large degree of heterogeneity (Higgins & 
Thompson, 2002). These findings provide an empirical 
rationale for testing potential moderators that might 
account for this variation (Cooper, 2010). In prelimin- 
ary analyses of study methodological characteristics as 
potential moderators, it was determined that there was 
a nonsignificant trend (Qb = 1.97, df = 1, p — .16) for 
effect sizes to be larger for samples in which youth 
were assigned to condition at the individual level 
(Mean ES = 0.18, 95% Cl 0.01-0.35, k = 11) rather 
than at the classroom or school level (Mean ES = 0.01, 
95% Cl ) 0.14 to 0.17, k = 12). All subsequent analyses 
thus controlled for this methodological characteristic. 


Target Problem. For the test of target problem as 
moderator, internalizing-focused interventions (Mean 
ES = 0.31. 95% Cl 0.02-0.61. k = 3) were combined 
with those with broad mental health and/or behavioral 
targets (Mean ES = 0.31, 95% Cl 0.19-0.44, k = 8), 
and conduct-focused interventions (Mean ES = )0.11, 
95% Cl ) 0.22 to ) 0.01 , k = 1 0) were combined with 
programs targeting substance use (Mean ES - 0.01. 
95% Cl ) 0.16 to 0.17, k = 2). Results indicated a sig- 
nificant difference in effect size as a function of target 
problem (Qb = 25.27. df = 1. p < .001). with a posi- 
tive and significant effect size when the program had 
an internalizing or broad/socio-emotional tot us 

(Mean ES = 0.32, 95% CIAA 0.19-0.44, k = 11) and 
a negative and nonsignificant effect size when the pro- 
gram had a locus on conduct problems or substance 
use (Mean ES =) 0.09, 95% Cl ) 0.18 to 0.01, 

= 12). This difference remained significant in all 
analyses controlling for other moderators. 

Level of Intervention. Ten samples received univer- 
sal and 13 received selected levels of intervention. 
Meta-analytic results revealed a significant difference in 
effect associated with this characteristic (Qb = 12.67, 
df = i p = .002), with interventions at the selected 
level (Mean ES =) 0.08, 95% Cl ) 0.21 to 0.04, 
k = 13) having a smaller, nonsignificant effect size rela- 
tive to those at the universal level for which there was 
a significant positive effect (Mean ES = 0.25, 95% Cl 
0.1 1—0.38, k = 10). This difference remained signifi- 
cant in all analyses controlling for other moderators. 

Change Agent. Thirteen samples received pro- 
grams delivered by usual care providers and 10 received 
programs delivered by researchers or research-hired 
staff*. Of those delivered by usual care providers, 44% 
of the samples received programs delivered by teachers. 
The remaining programs were delivered by other 
school staff' such as counselors or a combination ot pro- 
viders. Initially, meta-analytic results revealed a signifi- 
cant difference associated with this characteristic; 
however, this difference was no longer significant 
when controlling for either the target problem 
addressed or the level of intervention. Use of research- 
ers or research-hired staff to deliver the intervention 
was correlated strongly with programs having an inter- 
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nalizing or broad socio-emotional focus (r = 0.76) and 
with programs being universal rather than selected 
(r = 0.31). 

Ethnic Composition. Ot the samples included in 
the meta-analysis, six were primarily African American, 
three were primarily Latino, 10 were multiple minor- 
ity, and four were multiple ethnicity. Although effect 
sizes for samples including predominantly Latino youth 
(Mean ES = 0.49) were more than twice as large as 
those that included predominantly African American 
youth (Mean ES = 0.23), there were insufficient sam- 
ples in these categories to examine them using modera- 
tion analyses. Therefore, samples in which youth were 
predominantly African American and samples in which 
youth were predominantly Latino were combined and 
compared with a group made up of samples in which 
one type of race/ethnicity did not predominate. There 
was not a significant difference in effect size when 
race/ ethnicity was examined as a moderator in this 
way (Qb = 0.06, df = 1, p = .81). 

Consistent Program Implementation. Twelve of the 
2o evaluations were assessed for various aspects of the 
implementation process. Analysis of possible differences 
in effects as a function ot whether assessment of program 
implementation was reported (regardless of level) 
revealed a significant difference (Qb = 5.30, df = 1, 
p = .02), with those studies that did not report examin- 
ing implementation having higher mean effect sizes 
(Mean ES = 0.20, 95% Cl 0.05-0.35, h = 11) than 
those studies that did (Mean ES = ) 0.05, 95% Cl ) 0.20 
to 0.10, k = 12). This difference, however, tailed to be 
sustained when controlling for target problem addressed 
(p = .96) and level of the intervention (/? = .30), as lack 
ot use ot measures to assess for consistent implementation 
was correlated with an internalizing or broad socio-emo- 
tional focus (r = 0.70) and with universal rather than 
selected implementation (r = 0.15). 

Of the 12 samples with calculated effect sizes that 
examined consistent program implementation, four 
reported implementation as adequate, seven reported 
high implementation quality, and one reported low 
implementation quality. There was no significant dif- 
ference between those reporting adequate and those 
reporting high implementation quality (j> = .75). 


Inclusion of Parents, Teachers, and Peers. The major- 
ity of programs included parents, teachers, or peers 
(78%). There were significant differences among studies 
that were ecologically guided and those that were not 
(Qb = 5.86, df = \, p = .016). Those that did not tar- 
get the ecology ot the child had higher mean effect sizes 
(Mean ES = 0.36. 95% Cl 0.1 1-0.61. k = 5) than those 
that did reach beyond delivery at the individual level 
(Mean ES = 0.02, 95% Cl ) 0.10 to 0.13, k = 18). This 
difference no longer reached or approached signifi- 
cance, however, when controlling for target problem 
addressed {p = .90) or intervention level (p = .13), 
owing to a trend for programs with a conduct problems 
or substance use focus and those that were selective 
(each of which had weaker effects) to also be ecologi- 
cally guided (rs = 0.62 and 0.22, respectively). 

Use of Multiple Program Components. The majority 
ot programs included program components outside of 
the school-based curriculum (65%). There was no sig- 
nificant difference between effect sizes for evaluations 
ot programs that incorporated components beyond 
those that were school based and those that did not 
(Qb = 1.67, df = 1, p = .28). 

Integration of Program Content Into General Classroom 
Curriculum. Over halt ot the programs were deliv- 
ered to samples in the classroom setting (56%). No sig- 
nificant difference in effect size was found as a function 
of this variable (Qb = 1.00, df = 1, p = .32). 

Developmental Timing. The majority of samples 
were made up mostly of elementary students (k = 11; 
48%) or middle school students (k = 8; 35%). Only 
four samples (17%) were high school students. Results 
revealed no main effect for age as a continuous 
(Qb = 0.03, df = 1, p = .86) or categorical variable 
(elementary vs. middle vs. high school; Qb = 0.06, 
df = 2, p = .97). 

Supplementary Analyses. In view of the independent 
contributions of both target problem addressed and level 
of intervention to prediction of effect size, it was of 
interest to examine average effects for programs that 
reflected different combinations of these two program 
characteristics. As expected, there was a significant 
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difference among the four possible groups (Qb = 39.21, 
dt = 3, p < .001), and the strongest effects were evident 
tor programs that had an internalizing or broad/socio- 
emotional focus and were at a universal level 
(Mean £S = 0.33, 95% Cl 0.22—0.45, k = 7). Programs 
that had an internalizing or broad/socio-emotional focus 
and were at a selected/indicated level (Mean ES = 0.20, 
95% Cl ) 0.03 to 0.43, k = 4) and those that had an 
externalizing or substance use focus and were at the uni- 
venal level (Mean ES = 0.06, 95% Cl ) 0.07 to 0.19, 
k = 3) had positive but nonsignificant effects, whereas 
the remaining programs that had an externalizing or sub- 
stance use focus and were at a selected/indicated level 
had a negative and significant effect (Mean ES = ) 0.14, 
95% Cl ) 0.23 to ) 0.04, k = 9). The interaction 
between the two program characteristics in predicting 
effect size was nonsignificant (p = .66). Of note also is 
that the prediction model taking into account both pro- 
gram characteristics was found to account for nearly all 
of the variance around die true population effect size 
(89.4%; Raudenbush, 1994). The residual within-group 
variation, furthermore, was nonsignificant for this model 
(Qw = 24.82, df = 19, p = .17). These findings suggest 
that after accounting for these two characteristics of pro- 
grams, there was relatively little remaining variation in 
effect sizes. 

DISCUSSION 

The goal of this review was to evaluate the 
effectiveness ol school-based programs published in a 
25-year period between 1985 and 2009 and targeting 
socio-emotional and behavioral outcomes among 
low-income, urban children and adolescents and to 
compare the results of this evaluation with results of an 
evaluation similar in scope but focused on the broader 
child and adolescent population (Rones & Hoagwood, 
2000). Results ot the current review reveal different 
patterns for low-income, urban youth in comparison 
with those reported for the broader population of 
youth, illustrating the importance ot considering con- 
textual influences on treatment effectiveness. 3 

The most basic difference between the results of the 
Rones and Hoagwood (2000) review, which included 
youth from all racial/ethnic, geographic, and socio- 
economic backgrounds, and the results of the current 
review focused on low-income, urban youth is that 


Rones and Hoagwood found greater evidence for 
effective school-based interventions. A direct compari- 
son of results using the same qualitative methods 
revealed that Rones and Hoagwood (2000) found 
more programs to be effective and fewer to be ineffec- 
tive than we found in our review. Results of our 
meta-analysis are consistent with this pattern, as the 
overall effect size for programs implemented with low- 
income, urban samples was very small (i.e., only 0.08 
at post-treatment). Although a meta-analysis compara- 
ble in scope has not been conducted with the broader 
population of youth, these effects fall well below the 
medium to large effect sizes that have been reported 
for school-based prevention and intervention programs 
targeting youth from all backgrounds with established 
mental health problems (Reddy, Newman, Thomas, & 
Chun, 2009) and for psychotherapy treatments admin- 
istered to the broader youth population (Weisz, Weiss, 
Han, Granger, Si Morton, 1995). The results of this 
review higlilight the need to more systematically evalu- 
ate the impact ot socioeconomic factors on program 
development, mode of delivery, and treatment efficacy. 
Unfortunately, it has been reported that over 70% of 
all treatment outcome studies do not report on 
the socioeconomic characteristics of the participants 
(Weisz, Doss, & Hawley, 2005). 

Of the 23 samples examined, only six provided fol- 
low-up data, and the average effect size at this time point 
was slightly smaller than at post. Only two studies 
reported effects at a second follow-up time point; there- 
fore, interpretation of findings is limited. Nevertheless, 
analyses of differences for the same measures across time 
points revealed maintenance of gains over time for the 
intervention group in comparison with the control. 

In an effort to understand what factors might be con- 
tributing to reduced program effectiveness for low- 
income, urban youth, we examined program target, level 
of intervention, provider of services, and race/ethnicity 
as possible moderators in our meta-analysis. With regard 
to target, results of moderation analyses revealed a signif- 
icant difference in effect size between programs focused 
on externalizing problems (substance use and conduct 
problems) versus those focused on internalizing problems 
(depression and broad mental health and behavioral 
targets), with externalizing focused programs having a 
negative differential effect between treatment and 
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control groups and internalizing focused programs hav- 
ing much stronger and close to medium-sized differential 
treatment effects. Comparison of effect sizes for second- 
ary outcomes revealed a similar pattern, with negative 
effects found for externalizing outcomes and positive 
effects found for intemalizing/compctence outcomes 
and academic outcomes. 4 Results of qualitative analyses 
also revealed similar effects. No programs specifically 
focused on conduct problems were deemed effective. 
25% reported mixed results, and 75% were not effective. 
In contrast, samples included in Rones and Hoagwood’s 
(2000) review (excluding low-income, urban samples) 
found 42% of conduct-focused programs to be effective 
and 21% not effective, with the remaining reporting 
mixed effects. 

One interpretation of this pattern is that stressors 
endemic to urban poverty, such as exposure to com- 
munity violence (Cooley-Quille, Boyd, Frantz, &c 
Walsh, 2001), pull for externalizing problems, in par- 
ticular (Grant et al., 2004), and interact with treatment 
to limit effectiveness for this outcome. A number of 
studies have reported high rates of externalizing prob- 
lems relative to internalizing problems within the con- 
text of urban poverty (Grant et al., 2004). 
Furthermore, some emerging evidence suggests that 
exposure to community violence more strongly pre- 
dicts externalizing relative to internalizing problems 
within this context (Grant et al.. 2011). Scholars have 
theorized that externalizing problems may provide ben- 
efits within the context of urban poverty that include 
self-esteem, power, prestige, and protection from vic- 
timization (Cassidy & Stevenson, 2005). Additional 
research is needed to test the extent to which potential 
benefits associated with externalizing outcomes limit 
the effectiveness of interventions focused on this out- 
come for low-income, urban youth. 

The prevalence of externalizing problems in low- 
income, urban settings may be the reason for the pau- 
city of programs (which met inclusion criteria for our 
review) that focused on internalizing symptoms. Only 
two programs focused on depression, and no programs 
targeted other internalizing problems such as general- 
ized anxiety or symptoms of post-traumatic stress disor- 
der (PTSD)7 Instead, most programs focused on 
conduct problems, substance use, or emotional/behav- 
ioral difficulties more generally, although one did 


examine a treatment for concurrent internalizing and 
externalizing problems (Weiss, Harris, Catron, & Han, 
2003). Given the established risk for trauma exposure 
among youth in urban centers of the United States 
(Kliewer et al., 2006), it is problematic that programs 
addressing PTSD and other internalizing symptoms 
among low-income, urban youth have not been sys- 
tematically evaluated. Furthermore, the results of the 
current review suggest that these types of outcomes 
may be relatively amenable to treatment. 

Results of moderation analyses with level of inter- 
vention as the moderator revealed an overall signifi- 
cant difference, with universal programs having 
positive effects and targeted programs having negative 
effects. Furthermore, supplemental analyses revealed 
that programs with an externalizing or substance use 
focus which were at a targeted/indicated level had sig- 
nificantly more negative effects than those adminis- 
tered at a universal level (which had positive but 
nonsignificant effects). This pattern is consistent with 
prior research showing that conduct-disordered youth 
may escalate their problem behavior in the context of 
interventions delivered in groups with similar peers 
(e g-, Dishion, McCord, & Poulin, 1999). At least 
two explanations have been provided for such “devi- 
ancy training” in the literature. The first is that 
youths’ deviant behavior may be actively reinforced 
by externalizing peers’ laughter, social attention, and 
interest. Another explanation is that youth may derive 
meaning and value from the material discussed in the 
program, which may pique interest or provide 
motivation to commit delinquent acts in the future 
(e.g., Dishion et al., 1999). 

Prior reviews focusing on depression prevention 
programs have generally found that targeted programs 
are more effective than universal programs (Horowitz 
A' Garber, 2U06). However, universal school-based 
programs focused on youth anxiety have been found to 
be as effective or more effective that their targeted 
counterparts (Neil & Christensen, 2009). Reviews of 
school-based prevention programs for externalizing 
problems have similarly yielded inconsistent results 
regarding this issue. A recent meta-analysis of school- 
based randomized controlled trials focused on violence 
prevention did not find significant effects for either 
universal or targeted programs (Park-Higgerson. 
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Pcmmean-Chancy, Bartolucci, Grimley, & Singh, 
2008). Wilson and Lipsey (2007), on the other hand, 
synthesized the findings of 249 experimental and quasi- 
experimental trials of school-based programs for 
aggressive and disruptive behavior and found favorable 
significant positive effects for both universal and 
targeted interventions. Consistent with the findings in 
this meta-analysis. Wilson and Lipsey also found that, 
among universal programs, those conducted with 
low-income samples had stronger effects, relative to 
those with mixed-income or middle-class samples. 
Income was not associated with differential effects 
when examined in the selected/targeted programs. 

Results of moderation analyses with change agent as 
the moderator failed to reveal an overall significant dif- 
ference as a function of who administered the interven- 
tion once target problem and level of the intervention 
were controlled. This is not consistent with some prior 
research. In particular, Payton et al. (2008), in their 
meta-analysis of social-emotional learning programs, 
found stronger overall effects for teacher-implemented 
programs than for researcher-implemented programs 
among a more general sample of youth. The contrast- 
ing findings for the current sample of low-income, 
urban youth may indicate another contextually specific 
influence. Urban poverty not only affects individuals 
directly but also influences them by compromising sys- 
tems located in low-income communities (Conger 
et al., 1994). For example, schools in low-income, 
urban neighborhoods are often under-funded, under- 
resourced, and poorly functioning (e.g., Anvon, 1995). 
These systemic stressors, in turn, affect those individuals 
working within the school system (e.g.. leading to high 
teacher stress, burnout, and turnover; e.g., Abel & 
Sewell, 1999). Such processes might limit the ability of 
individuals working within those systems to effectively 
implement interventions. 

Results of moderation analyses with sample 
race/ethnicity as the moderator revealed no significant 
difference as a function of race/ethnicity. It is impor- 
tant to note, however, that low sample size for specific 
groups of youth (e.g., predominantly Latino youth) 
kept us from fully testing race/ethnicity as a moderator. 
In fact, as a result of sample size constraints, we were 
forced to combine two groups with substantially differ- 
ent effect sizes (i.e., predominantly African American 


and Latino youth). Our resulting finding, therefore, 
that effect sizes for programs in which participating 
youth were predominantly of a single race/ethnicity 
did not differ from effect sizes for programs in which 
participating youth were of multiple racial/ethnic back- 
grounds does not provide us with information about 
possible differences in effects as a function of member- 
ship in specific racial/ethnic groups. Results of our 
analyses do indicate no evidence that including youth 
of predominantly one race/ethnicity, in general, is 
associated with intervention outcomes. 

An examination of the five factors identified by 
Rones and Hoagwood (2000) as affecting outcomes, 
service sustainability, and maintenance revealed the fol- 
lowing: (a) approximately half of the studies reported 
on implementation quality' and, of these, no differential 
treatment effects emerged related to studies with high 
versus adequate implementation quality; (b) the major- 
ity of the programs in the current review included par- 
ents, teachers, and/or peers (78%), but this variable 
failed to function as a significant moderator once target 
problem and level of intervention were controlled; 
(c) a large portion (65%) of interventions included 
multiple components that reached beyond the school, 
but there were no significant differences between pro- 
grams that did and did not do this; (d) over half of the 
programs were delivered in the classroom setting 
(56%), but there were no differences as a function of 
this variable; and (e) there were no differential effects 
based on age. A brief discussion of these findings 
follows. 

In contrast with findings reported by Rones and 
Hoagwood (2000), significant differences between 
programs as a function of implementation quality did 
not emerge, and it is not clear why this was the case. 
Two possible explanations are provided. First, it may 
be that our measure of implementation, which was 
dependent upon investigators’ reports of implementa- 
tion assessment, was not valid. In other words, some 
investigators might have rigorously ensured fidelity but 
not reported it in their publication. An alternative 
explanation is that rigorous implementation of pre- 
scribed treatments may not always be warranted within 
the context of urban poverty. Perhaps flexible delivery 
and modification of treatment in real time bring bene- 
fits that temper the potential benefits of meticulous 
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implementation. Additional research is needed to test 
these hypothesized interpretations. 

Also not consistent with Rones and Hoagwood’s 
(2000) results were three findings related to ecological 
aspects of the interventions. Neither inclusion of par- 
ents, teachers, peers, nor use of multiple program com- 
ponents nor integration of program content into the 
classroom was found to influence program effective- 
ness. Although inconsistent with Rones and 
Hoagwood’s results with majority youth, these findings 
are consistent with one another. They also are consis- 
tent with the finding discussed earlier that (in contrast 
with previously reported findings with majority 
samples; Payton et al., 2008) implementation by indig- 
enous providers (e.g., teachers) was not associated with 
larger effect sizes relative to implementation by research 
staff. Taken together, these findings fit with the notion 
(also described earlier) that urban poverty not only 
affects youth directly but also influences them by com- 
promising systems that affect them, such as families and 
schools (e.g., Abel &' Sewell. 1999; Anyon, 1995; 
Gutman, McLoyd, ik Tokoyawa, 2005). As mentioned, 
such processes may limit the ability of family members 
and school staff working within those systems to 
effectively implement interventions. 

Finally, insufficient information was available to ade- 
quately assess the extent to which program components 
were developmentally appropriate; therefore, this vari- 
able was not analyzed in the same way it was in the 
Rones and Hoagwood (2000) review. Analyses of age 
as a moderator, however, revealed no significant 
effects, suggesting that developmental differences did 
not drive treatment effects in this sample of studies. 

Limitations 

One limitation of the current review is that our commit- 
ment to replicating the inclusion criteria used by Rones 
and Hoagwood (2000) meant that we included only 
published studies. This may have resulted in a bias 
toward significant findings (Reed, 2009). A second limi- 
tation that might also contribute to Type 1 error and 
affects all reviews is that authors might not have reported 
on all outcomes examined. In particular, authors may 
have been more likely to report positive and significant 
findings. Taken together, these limitations suggest that 
the current review’s estimates of the percentage of effec- 


tive programs and of the effect sizes for these programs 
may be inaccurately high. Given that relatively few pro- 
grams were found to be effective and the overall mean 
effect size for programs reviewed was very low, with 
confidence intervals indicating the effect may actually fall 
in the negative range, this limitation underscores the 
need for additional intervention development/modifica- 
tion work with low-income, urban youth. 

Additionally, it is important to note that, despite the 
reported growth of SBHCs, the focus of the evaluations 
included in this review were not of services provided at 
these centers, but were instead of programs developed 
by researchers and delivered in school settings. One, 
however, included an evaluation of a treatment inter- 
vention for depression (Interpersonal Psychotherapy for 
Adolescents; IPT-A), as delivered by staff at an existing 
school-based mental health center versus treatment as 
usual (Mufson et al., 2004), and one was an evaluation 
of an intervention that w;is developed as part of the 
process of establishing a series of school-based mental 
health clinics (Weiss et al., 2003). The lack of evalua- 
tions of existing school-based mental health service 
delivery models suggests that rigorous research studies 
are simply not being conducted on mental health ser- 
vices delivered within school settings. 6 This represents 
an important gap in the research literature. 

IMPLICATIONS FOR RESEARCH AND PRACTICE 

Results of the current review raise questions about the 
validity of current models of change with underserved 
populations and highlight the need for additional inter- 
vention development work for youth living in urban 
poverty. The overall extremely small mean effect size 
across studies, the limited number of positive findings 
when directly compared with the Rones and 
Hoagwood (2000) results using the same qualitative 
approach, and the disconcerting presence of potential neg- 
ative effects indicate that we remain very far from provid- 
ing effective school-based services to low-income, 
urban youth. 

Moderation findings provide leads for where to 
begin with this work. For example, basic and interven- 
tion development research should examine contextual 
processes that may pull for externalizing outcomes, in 
particular, and interact with treatment variables to limit 
positive effects for this outcome. Such work might 
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include the use of process observations, interviews, and 
focus groups, as well as quantitative analyses in order to 
develop innovative logic models for intervention that 
take influential contextual processes into account. 

Until additional school-based interventions are 
developed, modified, and/or implemented effectively 
with this population, results of the current review sug- 
gest that caution should be used when implementing 
school-based interventions with low-income, urban 
youth that target conduct disorder or substance use 
symptoms, especially among youth already displaying 
externalizing problems. In addition, clinicians can refer 
to Table 1 for a small list of programs that were gener- 
ally found to have larger effect sizes, and a larger list of 
programs that reported positive effects for some specific 
outcomes. The table also provides a list of interventions 
to avoid because of the potential for negative effects. 1 

In conclusion, the current review found limited 
evidence of effective school-based interventions for 
low-income, urban youth, especially for externalizing 
outcomes. Nonetheless, school-based services relative 
to other mental health service delivers' programs, 
particularly for low-income, urban youth, have been 
shown to be more accessible and engaging (Atkins 
et al., 2006). This, coupled with promising results for a 
handful of interventions, highlights the potential of 
school-based mental health programs to meet the needs 
of youth living in urban poverty. 

NOTES 

1. A table outlining results of studies for which effect sizes 
could not be calculated is available from the first author. 

2. Other controlled evaluations of school-based programs 
administered to low-income, urban youth (e.g., Dil worth, 
Mokrue, & Elias. 2002; Heydenherk A' I leydenberk, 2005; 
Mayers, Hager-Bundy, & Buckner, 2008; Roberts, White, & 
Yeomans, 2004; Weisz et al., 2001) were excluded for lack 
of pre-post evaluation methods or lack of standardized out- 
comes, or for design issues (not randomized, quasi-experi- 
mental, or multiple baseline). 

3. Weisz et al. (2005) summarized the youth treatment 
outcome literature across settings and conditions and found 
that very tew reported sample demographic characteristics. 
Future reviews rely on these data in order to evaluate the 
performance of these interventions with youth of diverse 
backgrounds, including those with few economic resources in 
particular. 


4. Program effects for primary, targeted outcomes were 
smaller than for secondary outcomes (although even these 
remained small at post-treatment and deteriorated further 
over time). Perhaps some youth in group interventions 
undermine explicit directions provided by the intervention in 
order to elicit attention from peers (Dishion et al.. 199V) but 
unwittingly benefit from other aspects of the intervention 
that affect outcomes not overtly identified. 

5. Although there have been published studies on PTSD 
among urban youth, primarily by the Cognitive Behavioral 
Intervention for Trauma in Schools (CBITS) program (i.e., 
Kntaoka et al.. 2003), studies were excluded when samples 
did not meet the low-income inclusion criteria. 

6. Other published evaluations of these services with this 
population (e.g., Robinson, Harper, A' Schoeny, 2003) were 
not included because of lack of pre-post designs. 
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